Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: log local TCP address #454

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

maxmoehl
Copy link
Member

@maxmoehl maxmoehl commented Nov 8, 2024

Summary

When troubleshooting issues related to keepalive it is often useful to be able to correlate multiple requests made over the same connection. A first step was to introduce the information on whether the request was made on a re-used connection or not which was done with a7dbf84. This commit adds the local address of the TCP connection which adds the capability to distinguish between multiple, parallel keep-alive connections by looking at the client port of gorouter for the backend connection.

Backward Compatibility

Breaking Change? Yes

@maxmoehl
Copy link
Member Author

maxmoehl commented Nov 8, 2024

The big question is: how do we proceed with such log changes which could be considered breaking? I don't think it's a healthy practice to keep adding flags for each and every individual log field we add 🤔

@ameowlia
Copy link
Member

ameowlia commented Nov 8, 2024

The big question is: how do we proceed with such log changes which could be considered breaking? I don't think it's a healthy practice to keep adding flags for each and every individual log field we add 🤔

I agree, I hate having to add so many config properties. But sadly, I do think that this is a breaking change. I think of all the logs that operators look at, I think they DO look at access logs. And they are not a nice json format, they are an unwieldy string mess that you would have to write a custom parser for.

Here are the options I see...

  1. Add the new field with no config property - risk breaking operators and facing their wrath (we should not do this)
  2. Add the new field to the end of the log line - still risk breaking operators and facing their wrath, but maybe slightly less risk (we should not do this either)
  3. Put it behind yet another config property - this fills up our code with config properties
  4. Develop a more advanced config property that allows users to "opt in" to certain fields by name 1 by 1. So that way we can add new fields, but they can always opt in / out of certain ones. - Fixes this problem for the future. But this is more work. And this is just a config property for everything! Will operators even opt in to any of them!?
  5. (From @geofffranks) add an additional access-log file that's json-based for people to migrate to, and we deprecate the old one eventually. When it is json based we can add new properties without it being a breaking change. - Personally I love this idea, but it is definitely more work.

So, in summary, I would say:

  • sadly this is a breaking change (whomp)
  • if you could chose one of options 3-5 that would be great

@maxmoehl
Copy link
Member Author

maxmoehl commented Nov 8, 2024

sadly this is a breaking change (whomp)

That's what I was expecting 😄

Develop a more advanced config property that allows users to "opt in" to certain fields by name 1 by 1. So that way we can add new fields, but they can always opt in / out of certain ones. - Fixes this problem for the future. But this is more work. And this is just a config property for everything! Will operators even opt in to any of them!?

That's probably what I will be going for in this PR to prevent accumulating further config flags. Just a flat list, and if anyone needs special properties they can be added and exposed via this mechanism without breaking anyone.

add an additional access-log file that's json-based for people to migrate to, and we deprecate the old one eventually. When it is json based we can add new properties without it being a breaking change. - Personally I love this idea, but it is definitely more work.

This is my preferred long-term solution, but it will require quite some effort and it's not clear when we will be able to invest.

@maxmoehl maxmoehl force-pushed the maxmoehl/add-local-addr branch 2 times, most recently from eac4b22 to 20a7867 Compare November 26, 2024 10:34
@maxmoehl
Copy link
Member Author

I've added a new property which allows logging of additional fields in a generic way. The release PR will follow once we agree on this PR.

@maxmoehl maxmoehl marked this pull request as ready for review November 26, 2024 10:40
@maxmoehl maxmoehl requested a review from a team as a code owner November 26, 2024 10:40
@@ -314,6 +314,27 @@ var _ = Describe("AccessLogRecord", func() {
Eventually(r).Should(Say(`x_cf_routererror:"-"`))
})
})

Context("when extra_fields is set to [local_address]", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an test for...

  • when extra_fields does not contain local_address, then the local address is not tested.
  • when extra_fields contains "meow" it ... does not error? logs? idk does something reasonable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added two more test cases.

Copy link
Member

@ameowlia ameowlia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Can you add a PR to routing-release so I can deploy and test everything at once?

@maxmoehl
Copy link
Member Author

I've opened cloudfoundry/routing-release#448 as a draft as I'm still adding tests.

@maxmoehl maxmoehl force-pushed the maxmoehl/add-local-addr branch 2 times, most recently from f9856b8 to 06efc11 Compare January 17, 2025 12:34
When troubleshooting issues related to keepalive it is often useful to
be able to correlate multiple requests made over the same connection. A
first step was to introduce the information on whether the request was
made on a re-used connection or not which was done with a7dbf84. This
commit adds the local address of the TCP connection which adds the
capability to distinguish between multiple, parallel keep-alive
connections by looking at the client port of gorouter for the backend
connection.
@maxmoehl maxmoehl force-pushed the maxmoehl/add-local-addr branch from 06efc11 to 085cb3b Compare January 17, 2025 13:24
@maxmoehl
Copy link
Member Author

maxmoehl commented Jan 17, 2025

The past weeks have been a bit busy, I finally found the time to clean this up. The rr PR is also ready now. Unfortunately test execution on my end is still hindered by cloudfoundry/routing-release#450. I ran the package tests using go test and they seem fine but I can't run the whole shebang, could you please do that? Thank you!

I'm currently also deploying this to one of our environments.

Edit: works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants